-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable for qnn operations for const folding transformation #9164
Conversation
Signed-off-by: Alexander Peskov <peskovnn@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @apeskov !
It is an interesting change.
However, looking at the test case, I am struggling to understand why this can not be achieved via running Legalize before ConstantFold pass. Would you be able to share some motivation?
func = relay.Function([x], add) | ||
return func | ||
|
||
zz = run_opt_pass(before(), transform.FoldConstant()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it different from running Legalize followed by FoldConstant ?
@apeskov Please see this PR #9135. I understand why you want to do this, namely, constant fold @manupa-arm Running lowering before const fold is not acceptable when we want to keep the rest of QNN graphs (BYOC), while selectively lower constant subgraphs and evaluate them. |
@masahi What is stopping us from the running the legalization on the IRModule with just the external function ? i.e. in the relay.ext.<codegen> |
I'm thinking of use cases in BYOC where we want to pattern match against QNN ops, in which case we don't want to run QNN legalization. Not sure if this answers your question @manupa-arm In my prev job, I directly took QNN subgraphs and send them to external codegen. I believe ethos-N does something similar. We had to develop constant folding on the external codegen side. |
Hmmm, I was not suggesting to run the legalization before the partitioning. We could identify the patterns with QNN ops and then we partition the external function. So there are two places we could do this :
I believe, this particular requirement is to mutate only the external function (thus, we could do it 2) and not 1) ) and not the 'main'. Therefore, why cant we achieve the same effect running the two passes -- legalization + constant folding -- in 2) ? |
Hmm interesting, I never thought about doing constant folding on partitioned functions. My use cases have always been doing constant folding on In 2), if we run legalization on partitioned functions, wouldn't that decompose all QNN ops? I couldn't easily extract qparams anymore, for example. I needed to retain QNN ops all the way until I translated them to the external IR, so running legalization had never been my option. I did wish that we could selectively lower const-foldable QNN subgraphs only. Maybe I'm missing something. |
I think the constant folding pass is supposed to work in the IRModule (with the external function). Therefore, everything in the IRModule will be affected. However, we could create IRModules with what is in-scope for the transformation.
It is about further granularity one would to do further partitioning. Today, I think we need to do further partitioning to achieve this. However, whether we want to annotations to block constant folding seems like an interesting but an orthogonal conversation to this one. In the scope of changes in this PR, I feel it does the same thing (destroys QNN info in the process of constant folding). However, we could control what we want to pass into the Constant Folding Pass. |
@apeskov please update or close this PR |
@apeskov Please update or close this PR |
closing due to inactivity, feel free to reopen |
Current the sequence
cons -> qnn.quantize
is not treated like a constant subgraph. Suggestion is to allow FoldConstant pass to replace this pattern with single int8 constant tensor.Reason: Some BYOC runtimes may has a limitation to have a weight like a constant tensor. Pointed FoldConstant pass limitation may breaks BYOC runtimes applicability.